Chinese Discourse Segmentation Based on Punctuation Marks

نویسندگان

  • Yancui Li
  • Hongyu Feng
  • Wenhe Feng
چکیده

This paper addresses Chinese discourse segmentation based on punctuation mark. Particularly, we propose various kinds of lexical, syntactic, position and punctuation features to train classifiers for Chinese discourse segmentation. Experimental results on CDTB (Chinese Discourse Treebank) show that our method based on punctuation mark is appropriate for Chinese discourse segmentation with 89.2% in accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discursive Usage of Six Chinese Punctuation Marks

Both rhetorical structure and punctuation have been helpful in discourse processing. Based on a corpus annotation project, this paper reports the discursive usage of 6 Chinese punctuation marks in news commentary texts: Colon, Dash, Ellipsis, Exclamation Mark, Question Mark, and Semicolon. The rhetorical patterns of these marks are compared against patterns around cue phrases in general. Result...

متن کامل

Application of Chinese Natural Language Generation in Semantic Web

RDF is the representation of the Semantic Web. When querying RDF documents, the result is a sub-graph of RDF data model or a number of triple statements. In this paper, we apply natural language generation technique to render the result into multi-sentential text for human comprehension. We investigate the effect of discourse segmentation on the generation of anaphora and punctuation marks in C...

متن کامل

Clause-based Discourse Segmentation of Arabic Texts

This paper describes a rule-based approach to segment Arabic texts into clauses. Our method relies on an extensive analysis of a large set of lexical cues as well as punctuation marks. Our analysis was carried out on two different corpus genres: news articles and elementary school textbooks. We propose a three steps segmentation algorithm: first by using only punctuation marks, then by relying ...

متن کامل

Information-based Aspects of Punctuation

We ooer a preliminary account of the information-based aspects of punctuation marks. We give our initial treatment within the Discourse Representation Theory and its segmented version. We hypothesize that this work will be useful in classifying the informational contributions of punctuation marks and bringing them to bear on the semantic characterization of written discourse.

متن کامل

On Generalized-Topic-Based Chinese Discourse Structure

Song Rou Jiang Yuru Wang Jingyi Beijing Language and Culture University Beijing University of Polytechnic Technology Beijing Forest University Beijing University of Information Science and technology Abstract: Due to the lack of external formal marks, components in Chinese discourse can hardly be categorized into the traditional syntactic system. In fact, Chinese is a typical topic-prominent la...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015